A Framework for Fault Tolerance Techniques in the Analysis and Evaluation of Computing Systems

نویسندگان

  • Hodjat Hamidi
  • Abbas Vafaei
  • Amirhassan Monadjemi
چکیده

The Algorithm Based Fault Tolerance (ABFT) approach transforms a system that does not tolerate a specific type of faults, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely safety and/or recovery. ABFT techniques are most effective when employing a systematic form. The error detection is employed based on a high-rate real convolution code. This paper addresses new methods for performing error correction when real number codes are involved. The parity values are determined according to a systematic real convolution code. Detection relies on two sets of parity values which are computed in two different ways, one set from the input data but with a simplified combined processing subsystem, and the other set directly from the output processed data, employing the parity definitions directly. The ABFT philosophy leads directly to a model from which error correction can be developed. By employing an ABFT scheme with effective real convolution code, the design allows high throughput as well as high fault coverage. The simulations show that the great difference between the round-off error and the computer-induced error is large enough to be distinguished.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An approach to fault detection and correction in design of systems using of Turbo ‎codes‎

We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...

متن کامل

رویکردی برای حفاظت از عملیات های پردازش داده در سیستم های محاسباتی با استفاده از کدهای کانولوشن

Abstract We present a framework for algorithm-based fault tolerance methods in the design of fault tolerant computing systems. The ABFT error detection technique relies on the comparison of parity values computed in two ways. The parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs. Number data proc...

متن کامل

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

A Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems

Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...

متن کامل

Novel Defect Terminolgy Beside Evaluation And Design Fault Tolerant Logic Gates In Quantum-Dot Cellular Automata

Quantum dot Cellular Automata (QCA) is one of the important nano-level technologies for implementation of both combinational and sequential systems. QCA have the potential to achieve low power dissipation and operate high speed at THZ frequencies. However large probability of occurrence fabrication defects in QCA, is a fundamental challenge to use this emerging technology. Because of these vari...

متن کامل

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012